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I.  Introduction 


In  this  note  we  consider  the  "two  class"  problem  of  statistical  classifi¬ 
cation:  We  are  given  two  random  variables,  X  and  X  ,  taking  values  in  to- 

12 

gether  with  some  (usually  incomplete)  information  about  their  distributions. 

We  assume  occurrences  of  type  1(X  )  and  of  type  2(X  )  are  mutually  exclusive 

1  2 

and  have  prior  probabilities  of  a  and  1-a  respectively  (0  <  a  <  1) .  If  x  is 
observed  how  do  we  decide  if  x  is  of  type  1  or  of  type  2  in  such  a  fashion  as 
to  minimize  the  probability  of  making  an  Incorrect  decision? 

If  the  probability  densities  of  X  and  X  ,  p  (y)  and  p  (y) ,  were  known 

12  1  2 

we  would  decide  by  using  the  likelihood  ratio  test: 

ap  (x)  >  1  type  2 

_ 2 _ 

(1  -a)p  (x)  <  1  type  1 

Unfortunately  in  many  practical  situations  (good  estimates  of)  the  probability 
densities  are  unavailable.  However  (good  estimates  of)  other  statistics  are 
available  (lower  order  moments,  spectral  estimates,  features,  etc.).  These 
enable  one  to  construct  a  family  of  discriminant  functions  L  whose  errors  may 
be  (estimated)  calculated  from  the  (estimated)  known  statistics.  We  give  a 
formal  definition  of  a  discriminant  function  and  its  error  as  follows. 

Def .  1  A  discriminant  function  L  is  a  mapping  L:  R^-^R.  The 
error  of  L  (relative  to  the  above  classification  prob- 
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lem)  is  the  infimum  over  all  real  t  of  the  expected 
probability  of  error  of  the  decision  rule  - 


L  (z)  >  t  type  2 
L  (z)  £  t  type  1 

The  error  of  L  is  given  by  the  expression 
<tn<  +  oo  [“  Probl  lL<\>  >  t}  +  (l-ct)  Prob2  |L(X2)  <  tf]  * 


Given  a  class  of  discriminants  L  our  goal  is  to  find  an  L  e  L  of  minimum 
error.  In  II  we  will  assume  that  the  distributions  of  L  e  L  under  each  hypoth¬ 
esis  may  be  parametrized  by  k  parameters.  Necessary  conditions  for  minimum 
error  will  be  derived.  In  III  certain  properties  of  the  normal  distribution 
relative  to  the  framework  of  II  will  be  discussed.  Finally  a  third  order 

solution  for  the  optimal  linear  discriminant  will  be  given  in  IV. 

th 

II.  Necessary  Conditions  for  k  Order  Solutions 

Let  P  be  a  class  of  continuous  probability  densities  on  the  real  line 

til 

parametrized  by  their  means,  variances,  third  moments,.., k  moments  about 

the  mean  (v1,  v2,  v3,  . ,.,v  ).  We  assume  further  that  D  is  a  location 

family:  uO^-u1,  v2 ,  v3,  ...,  vk)  (x)  =  p(v‘,  v2,  v3,  ....  vk)  (x  +  u1). 

23  k 

Consider  any  two  densities  p(0,  v  ,  v  ,  ....  v  )  and  p  ■ 

,  1112 
2  k  2 

U(l,  v  ,...,  v  ).  Let  E  (p  ,  p  )  =  E  (p(0,  v2,...),  p(l,  v  ,...)  be  the 

2  2  U121-1  1  2 

error  of  the  identity  discriminant  function  in  R  (for  the  two  class  problem 

with  densities  p  ,  p  ) . 

1  2 


*See  Footnote  (1)  on  page  18, 
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•V  I 


+  00 


E  (p  ,  u  )  -  inf  l(l-a)  /  u  dx  +  a  /  p  dxi 

a  i  2  t(  J  2  J  i  ) 

-on  t 

Assume  further  that  E  (p  ,  p  )  -  E  (v  2 ,  v  3 ,  . ...  v  k;  v  2,  v 


a*  i  2 

has  continuous  partials  wrt  v  2 ,  v  3 

l  l 


k. 

...  v  ) 
2 


*  •  •  •  * 


Def ■  2  V  is  said  to  have  momtone  error  at  the  pair  (p  ,  p^) 

2  3  lc 

if  E  (v  ,v  )  has  a  non-vanishing  gradient 

a  l  i  2 

at  v  2 ,  v  3 ,  . . . ,  v  lc. 

1  1  2 

Let  AcR^  be  some  parameter  space  and  let  L  =  {L(a);aeA)  be  a  family 

of  discriminant  functions  with 

E  (L ( a ) )  -  E  (L( a  ))  =  1  for  all  a. 

2  1 

Suppose  the  probability  density  of  L(a)  under  each  hypothesis  lies  in  V  and 


the  mapping  A  ->■  Ea(Vi2(  a  ),  -  V,  ®  >  ;  v2  2  (  a  ),  ....  v^a))  has  partial 

derivatives  of  the  first  order.  (v^fa)  is  the  ktfl  moment  about 
the  mean  of  the  random  variable  L(t)  under  hypothesis  i.) 


Theorem  1  Let  V  have  monotone  error  at  (p(0,  v^z(a')»  •••  (a’))» 

y(l,  v  2  (  a  '  ) ,  ....  v  a  ’  )) .  If  E  (v  2  (  a  ) ,  ...»  v^(a);  v  (a), 

2  2  U  I  1  Z 

v  ^(a))  has  a  local  minimum  at  a',  then  a'  is  a  critical  point  of 

Z  l  a  j  3  -  1 

i=l  1*2  i  vi  for  some  set  of  2(k_1)  real  numbers  B^  (not  all  0) 

j  2  k 

with  -1<  B . -* +  1.  If  E  (v  2,  ...  v  )  is  strictly  concave  (as  a  function 

1  (X  1  2 

2  a  ^  ^  ^ 

of  v  ,  .,.  v  )  at  v  (a'),  ...,  v  (a'),  then  the  above  critical  point  a' 

1  2  1  z 

2  k  1  1  -»■ 

is  a  strict  local  minimum  of  £  £  B.J  v  J  (a) . 

i-l  j-2 


Proof  Taking  partial  derivatives  of  E  wrt  a  at  a'  we  have 

A  A^L.K&’IJ  -° 

2  k  -> 

Since  the  gradient  of  E,  wrt  v  ,  v  is  non-zero  at  v  2  (a  ' ) , . .we  may  set 

ft  1  2  1 

)(i  1  **.  IV1 

1  \3',J  J.  /  V-l  1-2  3V(J  t,  I  / 


Then  a  is  a  critical  point  o 


f  ill  j?2  eijviJ^>' 


Suppose  E  (v  v  k)  is  strictlv  concave  at  v  2  (  a ’ )  ,  . . . ,  v  k(  a  ’  )  . 

ft  1  2  ‘  1  2 

Since  the  partial  derivatives  of  E  wrt  are  continuous,  E  (v  2 ,  . . . ,  v  k) 

ft  1  ft  1  2 

2  I  lc  | 

has  a  differential  at  v  (  a  ),  ...,v  (a  ).  Hence  for  anv  direction  u 

1  2 


(  l|fi|  =  1), 


)Ea  /  \  1  + 

-  *  ■  |  grad  E  \#u.  Bv  strict  concavity  E  (v  J  (  a  ’  )  + 

)u  a'  \  01  o  f  a  1 


cu.-5)  -  Ea(v.  (a'))  <  p(grad  |  )  •  u  for  p  sufficiently  small,  but  pos- 

a 

,  -►  -4- 

itive.  Hence  for  a  sufficiently  close  to  a',  but  unequal  a*, 

0  <  E^v^U  ))  -  Ea(v1J(a  ’))  =  Ea(vij(^  }  +  (vij  (  *  >  "  v± j  <  «  '»> 

<  A  A&|:')(Wil<S)  -V‘,(J,)) 


2  k 


Hence  a  is  a  strict  local  minimum  of  ^  £  8.p  v  ^(a). 

1=1  j=2  1  1 


The  preceding  theorem  allows  us  to  reduce  the  parameters  in  our  problem 

from  q  to  2(k-l)  as  follows:  For  any  choice  of  reals  8.^»  -1  £  8."^  <_  1,  we 

2  k  *  1 

find  a  set  of  critical  points  A(8.^)  of  j  J  8  ^  v .^(a).  Then  the  L  t  L  of 

1=1  j=2  1  1 

minimum  error  is  in  the  set 

|L(a)  :  leACS^),  -1  <  8^  <  l|. 

For  each  such  L  in  the  above  set  we  (estimate)  calculate  the  error  from  (per¬ 
formance  on  sample  data)  knowledge  of  the  densities  in  V.  The  L  of  minimum 
error  is  then  found  by  a  numerical  search  in  the  2(k-l)  dimensional  set  de¬ 
scribed  by  the  8^ .  Knowledge  that  the  above  critical  points  are  indeed 

strict  local  minima  may  be  extremely  useful  for  numerical  purposes  since  the 

2  k  .  + 

number  of  critical  points  of  2  £  8.^  v  J (a)  may  be  prohibitively  large 

i*l  j=2 

but  the  number  of  strict  local  minima  computationally  feasible.  This  will  be 

the  case  in  IV.  Hence  the  concavity  condition  may  be  extremely  important. 

For  this  reason  we  discuss  the  strict  concavity  of  E  (v  v  2)  for  k*2  and  V 

a  i  2 

the  set  of  normal  distributions  in  III. 

For  the  case  k  =  2  (second  order  solution)  we  may  reduce  our  problem  to 

one  with  a  single  parameter:  determine  critical  points  of  8v  2 (a)  +  (1  —  1 8 1 )v  2 (a) 

1  2 

for  -1  <  8  <  +  1.  This  was  shown  in  (1)  and  (2)  and  applied  to  various 
classes  of  discriminant  functions. 

The  above  results  are  completely  analogous  when  w?  parametrize  D  by 
statistics  other  than  moments.  The  choice  of  such  statistics  will  influence 
considerably  the  performance  of  a  kC^  order  solution. 
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III.  Some  Remarks  on  the  Normal  Distribution 


Let  V  be  the  2  -  parameter  class  of  normals.  For  convenience  denote  the 
two  variance  parameters,  v^2  and  v^2,  by  w  and  z  respectively.  Let 


+  °°  a 

E  C  (w,  z)  =  f  - 

a  J  p. — 

c  V2ttw 


exp  (~~)  dx  +  / 


i  -  a 


\/2ttz 


,-(x-l)2  ^  , 
exp  ( — Yi -  )  dx 


Then 


Ea(w,  z) 


inf  EaC  (w,  z)  =  EaC(w’  Z)  (w,  z) 

C  L 


where 


c(w,  z)  = 


2(—  -  —  ) 
z  w 


for  z  ^  w 


,  z  ,  Or  .1 

c(w,  z)  =  T  loR  T5~r^7*'  +  2 


for  z  =  w 


The  function  c(w,  z)  has  a  Taylor  expansion  about  any  point  of  the  form  (w,  w) . 
(w  >  0)  Hence  c(w,  z)  has  continuous  partial  derivatives  of  the  first  order 
in  z  and  w.  It  represents  the  smaller  roet  of  the  equation  cty(o,w)  =  (l-a)y(l,z) 
for  z  <  w,  the  larger  root  for  z  >  w,  and  the  only  root  for  z  =  w. 

Lemma  1  E^(w,  ^as  cont*-nuous  partial  derivatives  of  the  first  order 

given  by  the  formulae 


dE 

a 

dw 


Jl  *  -  ( ^>] 


dx 


6 


T-,  . 


•*V  »  " 

4U  ... 


m  c(w?-z)  a 

7k  7.  J  7\z 


^  I  vhT  exp  *1 


3  "a  9Ea 

Proof :  From  the  formulae  for  — jr —  and  — 5 —  it  follows  that  the  deriva- 

-  gw  0  z 

tives  are  continuous  in  w,  z.  Hence  we  need  only  derive  the  formulae.  We 

9Ea  9Ea 

derive  the  expression  for  — *r —  •  — r —  is  derived  analogously.  Consider 

0  w  0  z 

E  c(w’ ,  z)  _  E  c(w,  z) 
a  a 


+  °°  r  ,  2  1 

1  f  I  a  /-x  s  a  ,-x  s 

=  -  J  . -  exp  ( - )  -  =rr  exp  (-5— ) 

w?  -  w  c(w,  z)  Iv^w7  2w'  V2TTW  J 


1  c(w,  z)  r  a 

w’  -  w  c(„\  z) 


,-x2^  (1-a)  ,-(x-in  , 

exp  ( - )  -  exp  (-5- —  dx 

2w'  v  2ttz  J 


For  e  >  0  and  w'  sufficiently  close  to  w,  the  integrand  in  the  second  term  of 
the  previous  expression  will  be  of  magnitude  less  than  £.  Hence  the  second 
term  is  bounded  in  absolute  value  by 

[c(w  f  z)  -  c(w?  z)  |  e  converges  to  £|ip-|  as  w'  -*■  w.  Since  £ 

|w*  -  w|  dw 

was  arbitrary  the  second  term  converges  to  zero  and  the  first  term  converges 
to  the  desired  expression. 


! 


jr 


Lemma  2 


V  has  monotone  error  at  each  pair  (w,  z) 


Proof :  for  fixed  w,  z 

9E 


0W 


+  »  , 

=  A  /  (5- 

c(w,  z)  w 


-  e 


x _ 

2W 


)  dx 


is* 

3z 


=  B 


c(w,  z) 

/ 


(x-l): 


(x-1)2 

2  Z 


-  e 


(x-1)' 
2  Z 


)  dx 


where  A  and  B  are  non-zero.  The  first  integral  vanishes  only  if 
c(w,  z)  =  0  and  the  second  only  if  c(w,  z)  =  1.  Hence  both  partial  de¬ 
rivatives  are  not  simultaneously  zero. 


Theorem  2  Ea(w,  z)  strictly  concave  in  the  region  described  by 
the  inequalities 

0  <  c(w,  z)  <  1 
w  >  j  £c(w,  z)J2 
z  >  £l-c(w,  z)]2 

Proof:  let  (w,  z)  lie  in  the  above  open  region.  We  will  show  that 
is  strictly  concave  in  a  neighborhood  of  (w,  z) .  There  are 
neighborhoods  N  of  c(w,  z)  and  A  of  (w,  z)  such  that  for  any 
c  e  N  and  (w,  z)  e  A 
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0  <  c  <  1 


W  >  -J  c2 


z  >  2  (1-c)2 


Choose  a  neighborhood  of  (w,  z) ,  A1  c A,  such  that  c(w,  z)  e  N  for  all 
(w,  z)  e  A1. 


E  (w,  z)  -  inf  E  c  (w,  z)  =  inf  E„,C  (w,  z)  =  E^C^W’  (w,  z) 


Now  it  may  be  easily  shown  that  the  infimum  of  a  collection  of  strictly  con¬ 
cave  functions,  defined  in  a  common  open  domain,  is  strictly  concave  in  that 
domain  provided  the  infimum  is  assumed  at.  each  point  in  the  domain  by  some 

Q 

element  in  the  collection.  Hence  we  need  only  show  that  Ea  (w,  z)  is  strictly 
concave  in  A1  for  all  c  e  N. 


We  have 


+  00 


E  (w,  z)  =  --  —  /  e  dx  +  (1  -  a) 

\2tt  c 
•/w" 


[,  +a 
!-■£  f 

V27T  c-1 


+  °°  ^  X_ 

f  e  2  dx 


3E  c  _lc2 

0lla  acw  2  w 

__ —  >  -  e 


3w  Is/tT 


9  1  2 

C  “"2  _  JL  Q 

_ «  .  S22 _  e  2  w  (  2  _  3  * 


9e„ 


d* 


-§  _I  (1  -c)2 
(1  -  a)  (1  -  c)z  2  z 

2V27 


jV 

3z! 


_  ]_ 

(1  -  c)  (1  -  a)z  2 
4  V2tt 


1  (1-c)2 

2  2  ((1-c)2  -  3z) 


0w3z 


«  0 


1  0 

Since  the  second  partials  wrt  w  and  z  are  negative  in  A  ,  is  strictly  con¬ 
cave. 

By  inspection  of  the  second  partials  in  the  preceding  proofs  one  deter¬ 
mines  immediately  two  other  regions  of  strict  concavity: 

c  <  0  c  >  1 

1  ,  1  , 
w<  JC  and  w  >  y  c 

z  >  i  (1  -  c  )2  z  <  J  (1  -  c)2 

Corollary  1  Let  L  *  {L(a);  a  e  A)  be  a  family  of  discriminant  functions 

with  the  properties  of  section  II  whose  densities  are  the  two 

parameter  family  of  normals.  For  convenience  denote  v^  (a), 

v  2 (a)  by  w(a) ,  z(a)  respectively.  If  a1  is  a  local  minimum 
2 

of  Ea(w(a),  z(a))  satisfying 
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I 


Proof 


Corollary 


0  <  c(w(a') ,  z(a'))  <  1 

w(a')  >  -j  £c(w(a' ) ,  z(a')  J2 

z(a  )  >  y  £l  -  c(w(a* ) ,  z(a’))J*  , 

Chen  a*  is  a  strict  local  minimum  of  @w(a)  +  (l-8)z  (a)  for  some 
0  <  8  <  1. 


Since  E  1  is  strictly  concave  at  (w(a'),  z(a'))  by  Theorem  2,  a*  is 

a  strict  local  minimum  of  some  weighted  sum  of  w(a)  and  z(a), 

3  2  w(a)  +  8  2  z(a).  Let  c  =  c(w(a'),  z(a')).  From  Lemma  1 
1  2 


dw 


-►« 

a 


d_ 

3« 


->•  I 

a 


3z 


i 

a 


) 


■+< 

a 


From  the  formulae  in  the  proof  of  Theorem  2  these  partial  deriva¬ 
tives  are  both  positive.  Hence  3  2  ■  3  and  3  2  =  1-3  for  some 

I  2 

0  <  8  <  1  from  the  formulae  for  8^  in  the  proof  of  Theorem  1. 

2  Let  L  be  as  in  Corollary  1.  Let  a'  be  a  local  minimum  of 

Ea(w(a),  z(a)).  Let  be  the  probability  of  error  of  type 
i  for  L'  -  L(a').  (e  -  Prob  (L’  >  c) ,  e  -  Prob  (L’  <  c).) 

I  1  —  2  2 

Then  if  £.  satisfies  the  inequalities 

xi 

00  —  A— 

.5  >  e  >  /  — --  e  2  dx  (*  .04), 
vT  y/2r 
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a'  is  a  strict  local  minimum  of  6  w  (a)  +  (1  -  g)  z  (a)  for 
some  0  <  3  <  1. 


Proof  Since  the  error  probabilities  of  each  type  are  less  than 
0  <  c  <  1.  Also 


e  =  /  -p  — =-jp-  e“  .p  dx  =  / 

1  c  ^2-n  w(a’)  2  (  )  _ c _  V^T 


__x _ 

e  2  dx 


w(a') 


-  x 


1  -2 


>  /  — —  e  2  dx.  Hence  w(a')  >  c*.  Similarly 

VZn 


s(a')  >  i  |l-cj2.  Cotollary  1»  a*  Is  a  strict  local  mini¬ 


mum  of  gw(a  )  +  (1  -  g)  z(a)  for  some  0  <  g  <  1. 


IV.  Third  Order  Solution  for  the  Optimal  Linear  Discriminant 

Suppose  x  ,  x  ,  ....  x,  are  uncorrelated  real  random  variables  under  each 
12  d 

hypothesis  with 

E  (x  >  -0 

1  i 

1 

E  (x  )  *  1 

2  i 

E  (x.2)  -  X  1  ' 

i  i  l 

E  <(x.  -  l)2)  -  A  1 


In  practical  situations  this  can  be  achieved  by  applying  the  appropriate  af¬ 
fine  transformation  to  the  data. 


(  d  d  » 

L  =•  <  2  a  x  +  (1-  2  a.)  x  ;  a.  real}.  Finding  L  e  L  of 
(2  11  2  1  1  1  ) 


minimum 


error  is  equivalent  (in  the  third  order  sense)  to  finding  critical  points  of 

d  d  d  d 

8  2  E  (  2  a  x  +  (1  -  2  a  )  x  )2  +  B  2  E  (2  a  (x  -  1)  +  (1  -  2  a  ) (x  -  ] 
l  l  ~  i  i  0  i  l  2  2  ->  i  1  ~ii 


d  d  d  d 

6  3  E  (2  a.  x.  +  (1-  2  a  )  x  )3  +  B  3  E  (2  a  (x  -  1)  +  (1  -  2  a. )  (x  -  1) ) : 

l  i  ~  ±  ±  2  1  *  2  2  2  1  ^  2  1  1 

for  various  values  of  8^,  -1  £  B^  £  +  1.  This  objective  function  is  a  cubic 
in  d  -  1  variables  and  possesses  in  general  2d  ^  critical  points.  However  - 


Lemma  3 


Let  f  be  a  cubic  in  d  -  1  dimensions.  Then  f  has  at  most  one 


strict  local  minimum. 

Proof :  Suppose  f  has  two  strict  local  minima,  Sc  and  y.  Then  f  restricted  to 
the  line  (a  x  +  (1-a)  y;  -  <»  <  a  <  +  °°}  has  strict  local  minima 
at  aa0  and  a=l.  But  the  restriction  of  f  is  a  cubic  in  one 
dimension  which  has  at  most  one  strict  local  minimum. 

Suppose  Ea  is  strictly  concave  at  the  values  of  the  four  parameters  cor¬ 
responding  to  the  L  e  L  of  minimum  error.  Then,  by  the  preceding  lemma,  we 
need  to  determine  at  most  one  point  in  the  domain  of  the  objective  function 
for  each  choice  of  S^.  In  general  the  method  of  steepest  descent  will  not 
yield  a  strict  local  minimum  of  a  cubic  since  the  cubic  approaches  both  t  ® 


«  '  • 
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d  d  , 

+  2  a  J(g  3  E  (*  *)  +  8  3  E  <x  -1)*>  +  (1-  2«J  (8  S  E  (x  *)  +  8  *  E  (x  -1)  ) 

9  1  1  1  1  221  «  1  111  221 


“  t  A,  a.2  +  2  B,  a  3  +  A  (1  -  2  a  )  +  B  (1  -  2  a.) 

-1  9  1  1  1  <>  1  1  *>  1 


A  strict  local  minimum  of  H,  a,  corresponds  to  a  critical  point  (c,0)  of 

d  d  d 

K(c,  4>)  *  2  \  c±2  +  2  ci3  -  4>  (  2  cj,-  1) 


where  <t>  is  a  Lagrange  multiplier  and 


1  -  2  a,  -  c 
•>  1  » 


)  3  m  C  9  •  •  •  »  flj  *  C 


2  2 


d  "d- 


Recall  A^  >  0  since  B^2  >  0,  8^2  >  0.  Differentiating  K  wrt  c^  and  setting 
the  result  equal  to  zero  yields 


'i  Ci+  3B  i  ci2 


We  attempt  to  solve  the  above  system  for  0  <  ♦  <  4>max  subject  to  the  con¬ 


straint  2  c  ■  1  where 
1  1 


.  A  2  A~2  * 

min  1  m  K 

i;Bl<0  3|Bi|  3|Bfc| 


*See  Footnote  (3)  on  page  18. 
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taking 


Theorem  3 


Proof 


to  be  the  smallest  positive  root.  For  a  given  <|>  , 


Ci  = 


-2A.  +  V4A  2  +  12  B  .<1> 
_ _ _ _ i _ L. 

6B . 


Bt  0 


0 


2 

The  system  2A^  +  3B^  ■«!>  has  a  solution  of  smallest  positive 

roots  for  positive  with  the  roots  summing  to  one  provided 

* 


d  -  2At  +  V4A^  +  4B±  A^  (j\l)  f 


6B. 


>  1 


In  addition  the  corresponding  a  is  a  strict  local  minimum  of  H. 


:  For  <f>  close  to  zero  the  sum  of  the  roots  will  be  less  than  one. 

For  4>  =  <J>  the  sum  of  the  roots  is  greater  than  1  by  the  con- 
max 

dition  of  the  theorem.  By  the  mean  value  theorem  there  is  <J> 
such  that  the  corresponding  roots  sum  to  one. 

Since  (c, 0)  is  a  critical  point  of  K,  the  corresponding  a  crit" 
ical  point  of  H.  Hence  VH  is  zero  at  a.  To  show  that  a  is  a 


*See  Footnote  (4)  on  page  18. 


strict  local  minimum  we  compute  the  Jacobian  of  H  at  a  and  show  that 


it  is  positive  definite: 


32H 

a*i2 


2A.  +  6B4  a  +  2A  +  6B  (1  -  £  a.) 
i  i  i  i  i  2  i 


\/4A  2  +  12  B,4>  +  V 4A  +  12  B«t> 

i  i  li 


32H 


3ai  3a^ 


d  _ 

2A  +  6B  (1-  £  a.)  =  v  4A  2  +  12  B  d>  for  i  *  j 
i  l  T  i  i  i 


Since  <f»  <  <t»  all  the  radical  terms  are  positive.  Hence  J(H)  ■ 

max 

A  +  £2  where  A  is  a  diagonal  matrix  with  positive  eigenvalues  and 
is  a  matrix  whose  entries  are  a  positive  constant.  Clearly  such  a 
matrix  is  positive  definite.  This  completes  the  proof. 


Footnotes 


(1)  In  some  cases  one  uses  as  a  measure  of  error  the  probability  of  mis- 
classif ication  of  one  type  given  the  probability  of  misclassif ication 
of  the  other.  For  such  an  error  function  the  results  of  II  remain 
valid.  The  results  of  III  and  IV  in  this  setting  will  be  discussed 

in  a  future  paper. 

(2)  For  Pi2  =  f$2Z  =  y,  this  second  order  solution  is  known  as  the  Fisher 
line. 

(3)  If  there  are  negative  B^s,  0max  is  the  largest  0  for  which  each 

quadratic  has  a  solution.  If  there  are  no  negative  B^'s,  we  set 

0  =  +  00 .  Clearly  each  quadratic  has  a  solution  for  0  <  0  <0 

max  max 

(4)  If  B^  =  0  the  i'th  term  is  replaced  by  ^y--.  If  all  B^  are  non¬ 
negative  the  theorem  holds  without  the  inequality. 
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